Feature Hierarchy

Clusters Beat Trend!? Testing Feature Hierarchy in Statistical Graphics

Susan VanderPlas

Iowa State University

Graphics and Perception

The greatest value of a picture is when it forces us to notice what we never expected to see.

John Tukey

Gestalt Laws of Perception


The whole is different than the sum of the parts

Gestalt Plots

How do plot aesthetics (color, shape, trend lines, error bands) change our perception the plotted data?

Statistical Lineups

Which plot is the most different?

Null plot data is from a data-generating method consistent with the null hypothesis

The nullabor package helps with null data creation

Two-Target Lineups

5, 12

Data Generating Mechanism

Linear Model

Parameter: \(\sigma_T\), the amount of variability around the trend line

  1. Generate evenly spaced \(x_i\) in \([-1, 1]\)
  2. Jitter \(x_i\)
  3. Generate \(y_i = x_i + e_i\), \(e_i \sim N(0, \sigma_T^2)\)
  4. Center and scale \(x_i, y_i\)

Linear Model

Cluster Model

Parameters: \(K\) clusters, \(\sigma_C\) cluster variability

  1. Generate \(K\) cluster centers \(c^x,c^y\) on a \(K\times K\) grid such that \(cor(c^x, c^y) \in [.25, .75]\)
  2. Center and standardize \(c^x, c^y\)
  3. Determine cluster size \(g_1, ..., g_K \sim Multinomial(K, p)\)
  4. Generate points around cluster centers: \((x_i, y_i) = (c^x_{g_i}, c^y_{g_i}) + (e_i^x, e_i^y)\) where \(e_i \sim N(0, \sigma_c^2)\)
  5. Center and scale \(x_i, y_i\)

Cluster Model

Mixture Model

\(n_c\) points from \(M_C\), \(N - n_c = n_T\) points from \(M_T\), where \(n_c \sim Binomial(N, \lambda)\)

Groups created by k-means clustering

Mixture Model

Experimental Design - Data Parameters

18 combinations of plot parameters (\(2K \times 3\sigma_T \times 3\sigma_C\))
3 replicates of each parameter set 54 total lineup data sets

Experimental Design - Plot Aesthetics

10 Aesthetics \(\times\) 54 data sets = 540 plots

Experimental Design

Results

Most participants identified a mix of cluster and trend targets

Results

Faceoff Model

$ C_{ijk} := {kji}$

Faceoff Model

\[\text{logit} P(C_{ijk}|C_{ijk}\cup T_{ijk}) = \mathbf{W}\alpha + \mathbf{X}\beta + \mathbf{J}\gamma + \mathbf{K}\eta\]

Faceoff Model